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Abstract 

There are several applications in computational biophysics which require the optimization 
of discrete interacting states; e.g., amino acid titration states, ligand oxidation states, or discrete 
rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the 
number of sites to be optimized. In this paper, we describe a new polynomial-time algorithm 
for optimization of discrete states in macromolecular systems. This algorithm was adapted 
from image processing and uses techniques from discrete mathematics and graph theory to 
restate the optimization problem in terms of “maximum flow-minimum cut” graph analysis. 
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The interaction energy graph, a graph in which vertices (amino acids) and edges (interactions) 
are weighted with their respective energies, is transformed into a flow network in which the 
value of the minimum cut in the network equals the minimum free energy of the protein, and 
the cut itself encodes the state that achieves the minimum free energy. Because of its deter¬ 
ministic nature and polynomial-time performance, this algorithm has the potential to allow for 
the ionization state of larger proteins to be discovered. 


Introduction 

There are many problems in computational physics and biophysics which require optimization over 
an exponentially large state space. In this paper we demonstrate an algorithm adapted from com¬ 
puter vision for optimization over an exponentially large space in polynomial time for pairwise- 
decomposable interactions between states. We focus on the problem of macromolecular titration 
state assignment; however, there are many other exponential space optimization problems in com¬ 
putational biophysics, including inverse folding, 1 2 protein design, 3 ' 6 and protein structure opti¬ 
mization. 7 8 

Because hydrogens are rarely present in x-ray crystallographic structures, protein titration 
states often need to be computationally assigned to each titratable amino acid, including N- and 
C-termini, in the molecule. 9 The bases for most modern protein p K a were established by Bash- 
ford and co-workers who developed both brute force and Monte Carlo procedures for generating 
titration curves 10 11 . The p K a value of an amino acid or residue, together with the solution pH, 
provides a measure of the probability of protonation for a titration state: p K a = — log 10 ^ fl , where 
K a is the acid dissociation equilibrium constant for the residue. Experimental methods provide 
the best mechanisms for determining both p K a values and titration states of a protein residue, 12 14 
but experimental work is both time- and resource-consuming, so computational methods offer a 
compelling alternative for estimating p K a values and assigning titration states using a variety of 
physics- and statistics-based methods. 15 However, these calculations can be computationally de¬ 
manding as they require the calculation of all & ()V 2 ) pairwise interactions between all N titratable 
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residues, to determine intrinsic site pK a values,E- 6 followed by optimization over the ()’ (2 ,v ) poten¬ 
tial titration states of the system. 

There are several approaches to such optimization, including sampling approaches such as 
Monte Carlo simulation 10 11 17 '" 22 , molecular dynamics/ 23 26 and deterministic optimization meth¬ 
ods such as dead-end elimination (DEE). 27 28 Sampling methods explore the optimization land¬ 
scape using random move sets or trajectories generated from force-field-based equations of mo¬ 
tion. These methods have the advantage of generating thermodynamic ensembles through their 
sampling and are able to sample systems with complex energy functions; however, these methods 
are not guaranteed to find global minima. In contrast, the DEE approach - and its variants such 
as DEEPet^ - are global optimization approaches. However, these approaches are restricted to 
pairwise-decomposable energy functions to accelerate the search, thus limiting the complexity of 
energy functions for the system. 

The approach we describe in this paper is most closely related to DEE and its variants. Like 
DEE, we are guaranteed to find a minimum energy state through a deterministic, polynomial time 
process. The DEE algorithm scales cubically with the total number of rotamers in the system. The 
algorithm we describe in this paper relies on the use of the max-flow/min-cut theorem. There are 
new algorithms that approximate the minimum cut in roughly linear time in the number of edges, 29 
which results in an algorithm which is quadratic in the number of titratable residues. As in DEE, 
we are currently limited by pairwise-decomposable energy functions, however with more research 
we hope to be able to extend this work to energy functions with higher order interactions (ternary, 
etc.), possibly through the use of hypergraphs. 

Methods 

Energy functions for titration state assignment 

In this initial work, we will be following the simple and common approach 16 of assigning titration 
states to a rigid protein, wherein the backbone and amino acid locations are fixed. It is important 
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to note that, for the current paper, we are assigning specific titration states and not assigning p K a 
values or titration probabilities. Titration state assignment is important for setting up a variety of 
constant-titration calculations such as standard molecular dynamics simulations, docking simula¬ 
tions, etc. The full calculation of titration curves is a longer-term application of this method. All 
titratable amino acids, with the exception of histidine (HIS), are assumed to have two possible 
states: protonated or deprotonated. This assumption ignores (or assumes equivalent) the various 
tautomers and conformers that can be present for many amino acids; these cases will be addressed 
in future work. Histidine has three possible titration states which should not be considered equiva¬ 
lent: 30 a singly protonated state with a hydrogen on N e , a singly protonated state with a hydrogen 
on Ng, and a doubly protonated state with hydrogens on both nitrogens. The state in which both 
N () - and N e are deprotonated is highly energetically unfavorable and thus will be ignored. 

For N titratable residues, the set of all protonation states, of any protein without HIS can 
be described as the set of all {0,1} vectors of length N\ i.e., & — {0,1}^. If there are M HIS 
residues then & — {0, \} N ~ M x {0,1,2} M . Our goal in titration state assignment is to find a 
titration state P in g? which minimizes the protein’s energy at a given pH (or proton activity), 
volume, and temperature T. This free energy is often approximated as a pairwise-decomposable 
function between titration sites: 31 32 

N N N 

g(p ) = £ jiin(io)tr( P H- pC' ( ) + £ £ y,-tjU im (P h Pj) 0) 

/=1 i=\ j=\ 

where Jt is 1 when amino acid i is charged and 0 otherwise, p/£,'/’ [ ( is the intrinsic p K a of amino acid 
i, and U int (Pj. Pj) is the interaction energy between amino acids i and j. The intrinsic p K a value of 
amino acid i is the p K a value if all other amino acids are in their neutral state. Our formulation of 
the free energy is slightly different, though equivalent. 

Traditionally, the p K a of a given residue can be determined from the titration curve for that 
residue. The titration curve is a plot of fractional proton occupancy vs. pH, and the pH value 
at which the fractional proton occupancy is 1/2 will give the p K a value. When there is a greater 
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than 0.5 probability that the residue is protonated (resp. deprotonated), then we consider it proto- 
nated (resp. deprotonated). However, these fractional proton occupancies 32 are computationally 
intensive to compute since they require computation of energy of the protein in all 2 N states: 


fi 


y2 \ rexv G[PH 

Lj=\ kT 

y2 N exn^T 

Lj=l ex P kT 


( 2 ) 


where f t is the fractional proton occupancy of residue i, j runs through all of the 2 N protonation 
states, ji is 1 if residue i is charged in state j and zero otherwise, P J is the j th protonation state, and 
G(P-i) is the energy described in Eq.[l]for protonation state P J . 

Instead of computing fractional occupancy via the ensemble average above, we follow basic 
two-state linkage analysis to approximate the protonated fraction. 33 34 We define G p (i ) to be the 
change in energy between the states where residue i is protonated and where it is deprotonated. 
Given this definition, the fraction of protonated state is given as 


e -P G P (i) 

epi ' ,) = l+OHe-W)’ 


(3) 


where /3 = (RT )~ 1 . R is the gas constant, T is the temperature, and a h is the activity of the proton. 
The form for HIS residues is similar but differs slightly due to the three-state system. Below we 
drop the i from the formulas when the residue is clear, for easier readability. If we assume the 
anionic state of HIS is energetically prohibited, then there are three potential states for the system 


1 +e~P AG + ane 

e-PAC 

1 + e~P AG + a]je~P G p ’ 
a H e~P G p 

1 + e~$ AG + ciHe~P G p ’ 


(4) 

(5) 

( 6 ) 


where AG is the difference in energy between the 8 and £ states of HIS, 0§ is the fraction of states 
with HIS having a single proton on the 8 nitrogen, 0 e is the fraction of states with HIS having a 
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single proton on the £ nitrogen, and 9 p is the fraction of states with doubly protonated HIS. Our 
method relies on finding a minimum energy state for the protein at any given pH value, changing 
the state of each residue, and recording the change in energy. The method in this paper is focused 
on finding that minimum energy state efficiently. We then use that information to calculate the 
titration curve using Eq. [3] and Eq. [6j and subsequently the p K a value for each residue from the 
titration curves. 

In order to calculate the individual energies contained in Eq. |T[ we employ PDB2PKA, a part 
of the PDB2PQR 35 36 package based on the p K a methods of Nielsen et alT 7 The interaction en¬ 
ergy U(Pj,Pi) in Eq. |Tj includes background and desolvation energies, while U(Pj,Pj ) for i / j 
includes Coulombic and steric interaction energies 35 37 between sites. The background energy of 
site i is simply the energy of the site if all other contributions (other amino acids and the solvent) 
are removed. The desolvation energy, on the other hand, quantifies the interaction between the 
amino acid and the solvent, assuming all other amino acids are in their neutral state. The electro¬ 
static contributions to these energies are calculated via the Poisson-Boltzmann equation through 
the software package APBS;^ 8 the steric interaction energies are calculated via PDB2PQR. 35 36 
When two amino acid states have steric clashes, a “bump” term is added in the form of a large un¬ 
favorable energy contribution to reduce the probability of the two states happening simultaneously. 
In the current work, the PARSE 39 40 forcefield is used for protein radii and charge parameters. 

In addition to calculating interaction, background, and desolvation energies, PDB2PKA con¬ 
tains a Metropolis Monte Carlo for calculating titration curves and p K a values. The PDB2PKA 
Monte Carlo algorithm starts with a random titration state for each residue in the protein. For each 
step, the algorithm selects a random titration state for a random residue and calculates the energy 
difference AGi-ij with respect to the previous step. If the energy is lower, the random titration 
move is accepted, otherwise it is accepted with a probability equal to e^ AGl - ] l . The last 90% of the 
steps in the Monte Carlo simulation are used to estimate fractional proton occupancy by calculating 
the fraction of Monte Carlo steps in which the residue was protonated. 

In what follows, we compare this Monte Carlo approach with our energy minimization ap- 
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proach for the purpose of demonstrating our new optimization algorithms and qualitatively assess¬ 
ing their accuracy. However, we recognize that these two algorithms use different approaches for 
p K a calculations. The PDB2PKA Monte Carlo approach samples multiple states by implicitly cal¬ 
culating the probabilities presented in Eq.|2j Our approach uses Eq.[3]to directly calculate the p K a 
value, effectively neglecting the entropic contributions of multiple energetically accessible titration 
states. In future work, we plan to improve our algorithm to use the calculated minimum energy 
state as the reference structure for Monte Carlo sampling. 

Discrete optimization and graph theory 

Research efforts in the field of computer vision (e.g., image restoration, image synthesis, image 
segmentation, multi-camera scene reconstruction, and medical imaging) focused on efficient algo¬ 
rithms for energy minimization via graph cuts in networks. 41 43 Such applications often focused 
on the restoration of an image (a collection of pixels and possible pixel labels); e.g., the image 
may contain noise to be removed, sections to be segmented, or disconnected images that need to 
be integrated into a single image. These algorithms are designed to minimize an energy function 
which assigns an energy to each pixel based on its label (e.g., hue, intensity, segment membership, 
etc.) and to each pair of interacting pixels (usually neighbors) based on an interaction function. 
Thus, the energy function takes the form 

E(L) = £ Ei(L,) + ££ ZjfaLj) (7) 

;=1 

where L = (L\, L?,..., L,v), L, is the label of pixel i, and S is the list of pixel interactions, such that 
pixels i and j are said to interact if (/. j) e S. Notice that our protein energy function Eq. [ljcan be 
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written in this form with 


Ei = YMlO)kT(pH-pKfJ + yfU^iP^P,), 
Eij YiYjU mt (Pi,Pj), 

& = {(hj) ■ 1 ± ./}■ 


As a result of this similar pairwise-interaction form, we can apply the discrete minimization tech¬ 
niques established for computer vision to our protein problem. In the case of two-state labels, 
where amino acids can be only protonated or deprotonated, we can use these computer vision op¬ 
timization methods directly. We are also able to adapt these methods to the case of HIS which has 
three titration states, as described below. 

Application of graph theory to energy minimization 

To begin, we restrict our attention to proteins without HIS so that each amino acid has only two 
choices for its “label”: protonated or deprotonated. Below, we will discuss a method to include 
HIS in the graph-cut algorithm described above. The first step is to create a weighted graph which 
holds all of the information from the energy function. We will call this the energy graph. Each 
amino acid will be represented by two vertices, one for each titration state. If there are N amino 
acids in the protein then there are 2 N vertices in the graph. Each (amino acid, configuration) pair, 
(/. Pj), has energy contributions from the difference between its intrinsic p K a and the current pH 
value, as well as the yff/ int (P,,P ; ) background and desolvation energies, which we combine into 
Ej(Pj) as in Eq. [7J This energy will be assigned as a weight on each vertex. Additionally, each 
pair of amino acids can interact in both of their titration states, but amino acid i in its deprotonated 
state cannot interact with itself in its protonated state. Therefore, for each pair of amino acids 
there are 4 edges, with a total of 4(^) = 2 N(N — 1) edges for the entire protein. Edge weights are 
given by the interaction energy for the particular amino acids and configurations, YiYjE mt {Pi,Pj) as 
in Eq. [T} Additional information including graph theory background, details for constructing the 
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energy graph, and example graphs are given in Supporting Information. 

The energy graph can be simplified by moving some edge weights to the vertices, and some 
vertex weights into a universal constant. Through this simplifying process, we will end up with 
the normal form of the energy graph 44 where all of the edge and vertex weights are non-negative 
and as many as possible have weight zero. The normal form energy graph is then transformed into 
an energy flow network using a procedure which guarantees that the minimum cut in the network 
will equal the minimum energy of the protein titration system. The graph transformation via sim¬ 
plification to normal form and representation as an energy flow network is described in Supporting 
Information. Note that, after these transformation processes, the edge weights still represent en¬ 
ergy values but are no longer interaction energies between the protonation states represented by 
the vertices in the edge. Instead, these energies represent interactions between groups in the new 
sparser graph created by the normalization process. 

The energy of a protonation state can be determined by choosing the vertices in the energy flow 
network corresponding to that particular protonation state, discarding all other vertices (and corre¬ 
sponding edges), and taking the sum of the edge and vertex weights that remain. Therefore, a narve 
approach to energy minimization would use this procedure to find the energy for all 2 N possible 
protonation states and choose the state which has the minimum energy. However, this brute-force 
algorithm has exponential complexity, making it infeasible for even moderate-size proteins. This 
selection of vertices and edges in the energy graph can also be represented through a graph cut in 
the energy flow network as detailed in Supporting Information. Briefly, a graph cut defines a given 
protonation state of the protein by assigning all vertices associated with that protonation state into 
one set, and all others into a second set. It can be shown that the cut value associated with this cut, 
the sum of edge weights for all edges from vertices in the first set to vertices in the second, plus 
the global constant from the normal form procedure, is exactly the energy of the associated state of 
the system. 44 Additional requirements are needed to ensure that the minimum cut in the network 
yields the minimum energy configuration. In a 2004 paper, Kolmogorov and Zabih 42 proved for 
pairwise-interacting states where all pairwise interactions are submodular, it is possible to find 
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the exact minimum energy state in polynomial time by computing the minimum cut on the flow 
network of the associated graph. For our titration state application, submodularity means that, for 
all pairs of amino acids, the interaction energy when they are in the same state is smaller than the 
interaction energy if they are in different states. In other words, submodularity implies that the 
sum of interaction energies for the singly-protonated states exceeds the sum of those for the the 
doubly-deprotonated and doubly-protonated states. 

Protein titration site interaction networks are not guaranteed to have submodular energies. 
However, it is still possible to use the graph-cut method to label the portion of the amino acids 
whose energy functions are submodular . 44 The remaining amino acids must be assigned by some 
other optimization method (e.g., Monte Carlo or brute force) on only the unassigned amino acids. 
In the Discussion, we discuss the practical implications of unassigned amino acids. We note that 
the number of unassigned residues is highly dependent on algorithms used to construct and cut the 
interaction energy graph. Because of this issue, we are not able to predict the number of unas¬ 
signed residues by just looking at the interaction energies. In future work, we plan to exploit the 
use of multiple different minimum cut algorithms to obtain the smallest number of unassigned 
amino acids, since all are fast to run but some yield fewer unassigned amino acids than others. 

Moving beyond two states per amino acid 

The discussion of the previous section was limited to minimizing two-state titratable systems using 
graph cuts. However, as mentioned in the introduction, not all titration sites can be represented 
simply by two states. In particular, HIS must be represented with three titration states: two neutral 
tautomers and one positively charged state. To accommodate this increase in states, we represented 
the neutral states of a HIS residue as two separate residues: HIS e and HIS 5 . If, in the result of the 
graph-cut algorithm, only one of these states is protonated, then that is the appropriate minimum- 
energy state. If both HIS e and HISg are protonated, then HIS is fully protonated (in the charged 
state). The doubly deprotonated state of HIS is excluded; this exclusion is enforced by an infinite 
interaction energy between the HIS e and HIS § states. The energetic differences between HIS e 
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and HIS § are calculated in PDB2PKA and currently only include electrostatic interactions with 
the environment - although it is known that these tautomers are not energetically equivalent in 
isolation. 45 

In order to separate each HIS residue into two separate residues we must define the interaction 
energy between a separated HIS residue and a non-HIS residue, based on the energies of the non- 
separated case. We also must define the interaction between two separated HIS residues. The 
simplest case involves interactions of HIS with a non-HIS residue. The details for constructing the 
associated energy graph are given in Supporting Information. 

Calculation details 

The 0 p values were calculated using Eqs. [3] or [6] based on the lowest-energy state obtained from 
the graph-cut method. The graph-cut algorithm was written in Python and executed on desktop 
computer running Macintosh OSX 10.6.8, with two 2.8 GHz 4 Core Xeon E5462 x2 processors 
with 20GB RAM. The resulting p K a values were compared against the Monte Carlo method in 
PDB2PKA. Both algorithms were run to calculate titration curves using pH values from 0 to 20 in 
increments of 0.1. 

Test proteins were selected from the PDB by identifying all entries with a single chain, no 
ligands or modified amino acids, and resolutions below 2.0 A. The resulting list was then re¬ 
fined by processing through WHATCHECK; 46 proteins with errors or excessive warnings were 
removed from the list. Finally, these proteins were run through the PDB2PKA software. Proteins 
with excessive energies (typically due to unresolved steric clashes or non-converged electrostatics 
calculations) were removed from the list. The final list of 82 proteins is provided in the Support¬ 
ing Information iPython notebook and shows the distribution of proteins with between 11 and 45 
titratable residues. We calculated titration curves for each of the titratable residues in 87 different 
proteins, resulting in 2337 predictions for each algorithm. 

Recall that in the case of non-submodular interaction energies, which is typically our setting, 
we may not be able to label all residues using the graph-cut algorithm. In the cases where we had 
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a significant number of unlabeled residues following the graph-cut method we needed to perform 
additional optimization. If there are fewer than (or equal to) 20 unassigned residues, we do a brute 
force search to find the minimum energy state. The “brute force” approach enumerates the 2" (for 
n unlabeled residues) possibilities to identify the one with the lowest energy. If there are more than 
20, we perform a Monte Carlo optimization sampling a subset of the 2" states. 

Results 

As discussed above, there is a possibility that some residues cannot be assigned titration states 
due to non-submodular energies. Figure [I] shows the number of residues which were not assigned 
using the graph-cut method as a function of protein size. 

The graph cut and PDB2PKA titration curves were compared by the mean absolute difference 
of the titration probability integrated over the pH range: 

1 r 20 , . 

6 il ~ 20 Jo P pDB2PKA P Gra P h cut | ^pH (8) 

with the results shown in Fig. [2j Results for the mean-squared differences are also provided in 
Supporting Information. Most titration curves show high levels of agreement with less than 5% 
error. The three curves with the worst agreement (4PGR TYR 167: 11% error, 4PGR ASP 195: 
10% error, and 3IDW GLU 51:7% error) are shown in the iPython notebook in Supporting Infor¬ 
mation. Further analysis of the few residues which showed large deviation between the PDB2PKA 
and graph cut titration curves showed very large variations in interaction energies. In particular, the 
set of all interaction energies between the residue in question and all other residues show a much 
wider range of energy values than other better-behaved residues. We believe that the significant 
energy outliers confound probabilistic Monte Carlo sampling and optimization and lead to the poor 
agreement for these few cases. 

The titration curves were used to derive pK a s by locating the pH values where the curves 
crossed 0.5. This approach was used in lieu of Henderson-Hasselbalch fitting because of the cou- 


12 



Figure 1: Distribution (over pH values and proteins) of residues left unlabeled, and thus needing 
brute force calculation, after applying the graph-cut algorithm. 
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(A) 



Figure 2: Comparison of titration curves with differences measured by i\ difference between 
PDB2PKA and graph-cut, as defined in the text (Eq. [8]). (A) Distribution of differences across 
the 2337 titration curves. (B) Distribution of differences for the titratable groups studied in this 
paper. 
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pled titration events observed in several proteins. Our simpler approach is primarily intended to 
compare the computational methods to each other - rather than generating p K a values for compari¬ 
son with experiment. Figure [3]compares the p K a values calculated by the graph-cut and PDB2PKA 
methods. The pK a values are strongly correlated with a Pearson r 2 = 0.996, a slope of 1.01, and 



Figure 3: Comparison of p K a values calculated with the graph-cut method and with PDB2PKA. o: 
arginine, □: aspartate, +: C-terminus, x: glutamate, *: HIS e ,o: HIS^, A: lysine, V: N-terminus, 
<: tyrosine. Line shows linear fit with p < 0.0001. 


an intercept of —0.19. A comparison of pK a shift^is presented in Supporting Information. 

1 A p K a shift is the difference between the observed p K a value and a model p K a for that residue type. 
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Discussion 


Our graph-based approach solves an optimization problem to determine the titration state with the 
lowest energy at a particular pH. This a potential source of error in our prediction of p K a values 
and titration curves because the titration probabilities should be calculated as thermodynamic av¬ 
erages over the entire ensemble of states. However, Figures [2] and [3] show that the results from our 
optimization approach and ensemble averages (from PDB2PKA) are very similar. This suggests 
a scarcity of low-energy states surrounding the titration transitions and is directly related to the 
accessibility of alternate charge states by thermal fluctuations. This phenomenon was described 
by Kirkwood and Shumaker 47 and continues to be an active area of theoretical and computational 
investigation. 48 The effects of thermal fluctuations are expected to be largest when the pH is close 
to the p K a of a site. 47 49 51 The close agreement between the optimization and Monte Carlo results 
in Figures |2]and|3]indicate that such fluctuations do not play a major role in the system we studied. 
However, future work will explore the possibility of using the graph-cut optimization as input to 
Monte Carlo sampling around the energy minimum to efficiently sample only the most relevant 
fluctuations. 

As described above, residues that violate the modularity condition for titration site interactions 
are not labeled by the graph cut algorithm and must be optimized by either brute force or Monte 
Carlo methods. The iPython notebook provided in Supporting Information illustrates the impact 
of unlabeled residues on execution time. Future research will focus on better understanding the 
influence of protein structure and energetics on this submodularity and to use more sophisticated 
optimization methods on the residues not labeled in the graph cut optimization procedure. 

Conclusions 

Most current titration state prediction algorithms suffer from either performance or sampling issues 
when searching over the 6 (2 ,v ) titration states associated with N titratable residues in a protein 
system. Sampling issues have historically been a major problem for the PDB2PQR/PDB2PKA 
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Monte Carlo approach for sampling titration states as well as other software packages. This paper 
presented a new polynomial-time algorithm for optimization of discrete titration states in protein 
systems. This algorithm was adapted from image processing and uses techniques from discrete 
mathematics and graph theory to restate the optimization problem in terms of “maximum flow- 
minimum cut” graph analysis. The interaction energy graph, a graph in which vertices (amino 
acids) and edges (interactions) are weighted with their respective energies, is transformed into a 
flow network in which the value of the minimum cut in the network equals the minimum free 
energy of the protein, and the cut itself encodes the state that achieves the minimum free energy. 
Because of its deterministic nature and polynomial-time performance, this algorithm has the po¬ 
tential to allow for the ionization state of larger proteins to be discovered. 

There are several other problems in macromolecular modeling that require optimization over 
an exponentially large space of states, including inverse protein folding and design 2 52 and rotamer 
sampling/selection. 53 In the future, we plan to extend this work to some of these other applications. 
However, the systematic extension of this approach to multi-state systems will be challenging. For 
example, in order to adapt this work to rotamer selection we need many more than two labels per 
amino acid and a more generalizable approach for decomposing pairwise interactions between ro¬ 
tamer states. In particular, we need a way of creating a flow network that will allow us to handle 
any number of labels per amino acid. There has been work in multi-label algorithms for computer 
vision, 41 43 54 56 however they all require that the energy functions satisfy submodularity as in¬ 
troduced earlier in this manuscript. However, algorithms such as a—expansion and a - /3-swap 
which will calculate minimum energy approximations without the need for such restrictions. 43 In 
future work, we plan to combine these approximations with the non-submodular case discussed in 
the rest of this paper in order to remove dependency on these energy function conditions. 
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Supporting Information Available 


In a separate supporting information document we provide more details on graph theory formal¬ 
ism, network flows, the creation of the energy graph, normal form, and energy flow network, and 
reduction of HIS from one amino acid with three states to two interacting amino acids with two 
states. Additionally we provide all of our data and an iPython notebook in order to reproduce our 
results, and dig further into the analysis of our algorithms. Finally, we also provide the graph cut 
Python code which can compute minimum energy states and titration curves using the algorithms 
described in this paper. 
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